#Dynamic Memory Sparsification11/06/2025
NVIDIA Unveils Dynamic Memory Sparsification for 8× Compression of Transformer KV Caches
NVIDIA researchers developed Dynamic Memory Sparsification (DMS), a novel method that compresses KV caches by 8× in Transformer-based LLMs, improving inference efficiency while maintaining accuracy.